Skip to content

Update datamigration for rangepartition#462

Open
yi-xmu wants to merge 6 commits intomainfrom
update_datamigration_for_rangepartition
Open

Update datamigration for rangepartition#462
yi-xmu wants to merge 6 commits intomainfrom
update_datamigration_for_rangepartition

Conversation

@yi-xmu
Copy link
Collaborator

@yi-xmu yi-xmu commented Mar 18, 2026

Maintain the range size in ccmap during bucket migration.

  1. Remove the range size info from the ccmap from old bucket owner node group.
    Specifically, after kickout range data for migration, the range size information for the corresponding range ID is deleted.
  2. Update the forward logic for the keys written during the "double-write" phase.

Summary by CodeRabbit

  • New Features

    • Read-only access to an entry's partition identifier exposed via public API.
  • Bug Fixes

    • Improved tracking of keys during range splits and migrations.
    • Migration-clean now removes obsolete range-size metadata and consistently cleans migrated partitions.
    • Better handling of forward/target tracking for moved data to avoid stale state.
  • Chores

    • Consolidated table-type resolution and streamlined related control flow and logging.

yi-xmu added 5 commits March 18, 2026 11:15
1. Adapt kickoutdata operation with new key sharding logic for range partition
2. Fix the execution flow: after processing the range partition table, the hash partition table needs to be checked.
3. Simplify code using a unified interface.
@yi-xmu yi-xmu self-assigned this Mar 18, 2026
@yi-xmu yi-xmu requested a review from liunyl March 18, 2026 03:42
@coderabbitai
Copy link

coderabbitai bot commented Mar 18, 2026

Walkthrough

Adds a public accessor to expose KickoutCcEntryCc partition id, tracks keys on migrating/splitting ranges in write entries, removes range-size metadata during migration-clean paths in TemplateCcMap, and updates transaction/kickout flows to record forward targets and unify table-type resolution.

Changes

Cohort / File(s) Summary
CC Request Interface
tx_service/include/cc/cc_request.h
Added int32_t GetPartitionId() const to KickoutCcEntryCc to expose the internal range_id_.
CC Map / Range-size Cleanup
tx_service/include/cc/template_cc_map.h
Added void RemoveRangeSize(uint32_t partition_id) and wired calls to remove range-size metadata from kickout/migration-clean paths.
Write-set Entry Tracking
tx_service/include/read_write_entry.h
Added bool on_dirty_range_ to WriteSetEntry to mark keys living in splitting/migrating ranges (default false).
Local CC Handler
tx_service/src/cc/local_cc_handler.cpp
Treat CleanRangeDataForMigration the same as CleanRangeData for core assignment and local routing of kickout requests.
Transaction Operation Logic
tx_service/src/tx_operation.cpp
Record range dirty state into write_entry.on_dirty_range_, capture forward-address range id via a new pointer when creating forward entries, and unify TableName::Type derivation across forward/kickout/migration flows with related control-flow adjustments.

Sequence Diagram(s)

mermaid
sequenceDiagram
autonumber
participant TxOp as TxOperation
participant LocalCC as LocalCcHandler
participant CCMap as TemplateCcMap
participant Storage as Storage/Shards

TxOp->>LocalCC: request kickout/forward (includes forward_addr, partition id)
LocalCC->>LocalCC: map CleanType (treat migration as range clean)
LocalCC->>CCMap: complete KickoutCcEntryCc (calls GetPartitionId)
Note right of CCMap: if migration-clean -> RemoveRangeSize(partition_id)
CCMap-->>LocalCC: completion ack
LocalCC->>Storage: finalize deletion/migration of data
TxOp->>TxOp: set write_entry.on_dirty_range_ / store forward-entry range id

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

Suggested labels

bug

Suggested reviewers

  • liunyl
  • lokax

Poem

🐇 I hopped through partitions, light and spry,
I nudged a range and watched it fly.
Sizes cleared and forwards set,
Dirty flags mark paths well-met,
Hooray — migrations tidy, bye-bye! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title broadly references the main change (updating data migration for range partitioning), but lacks specificity about the key improvements like range size cleanup.
Description check ✅ Passed The description clearly explains the two main objectives (range size cleanup and forward logic updates) and provides concrete implementation details, but does not complete the provided template checklist items.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch update_datamigration_for_rangepartition
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@yi-xmu yi-xmu linked an issue Mar 18, 2026 that may be closed by this pull request
4 tasks
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tx_service/include/cc/template_cc_map.h`:
- Around line 6986-6993: The code casts req.GetPartitionId() (int32_t) to
uint32_t and calls RemoveRangeSize when CleanType::CleanRangeDataForMigration,
but only asserts partition_id != INT32_MAX which is disabled in release builds;
add a runtime bounds check to ensure partition_id is non-negative (and not
INT32_MAX if needed) before the static_cast and RemoveRangeSize call to prevent
negative values wrapping to huge uint32_t values. Locate the block handling
CleanType::CleanRangeDataForMigration around req.GetPartitionId(), validate
partition_id >= 0 (and still guard against INT32_MAX if required), and only call
RemoveRangeSize(static_cast<uint32_t>(partition_id)) when the check passes; if
the check fails, log or handle the error consistently with existing patterns in
local_cc_shards.h and cc_request.h.

In `@tx_service/src/tx_operation.cpp`:
- Around line 7993-8013: The code can fall through from the range-table
exhaustion path into the hash-partitioned kickout code with
kickout_hash_partitioned_tbl_it_ still at begin() or equal to
hash_partitioned_tables_snapshot_.cend(), which skips the first hash table or
can advance cend() when the snapshot is empty; before entering the hash kickout
block (the code that checks kickout_hash_partitioned_tbl_it_ and then sets
post_all_bucket_lock_op_ or populates kickout_data_op_), add the same
dropped-table handoff handling used for the range loop: if the range-loop
exhausted tables, update/advance kickout_hash_partitioned_tbl_it_ safely
(checking hash_partitioned_tables_snapshot_.empty()), and if it equals cend()
set post_all_bucket_lock_op_.write_type_ = PostWriteType::PostCommit and call
ForwardToSubOperation(txm, &post_all_bucket_lock_op_) then return; otherwise set
kickout_data_op_.node_group_, table_name_, start_key_, end_key_, bucket_ids_,
clean_type_ and call ForwardToSubOperation(txm, &kickout_data_op_) as currently
done.
- Around line 7993-8013: The error path currently assumes kickout_range_tbl_it_
is valid and logs kickout_range_tbl_it_->first, but when handling
hash-partitioned tables the code sets kickout_data_op_.clean_type_ =
CleanType::CleanBucketData and kickout_range_tbl_it_ is equal to
ranges_in_bucket_snapshot_.cend(), causing undefined behavior on retry; update
the error handling in the retry/error branch to branch on
kickout_data_op_.clean_type_ (or otherwise check kickout_range_tbl_it_ !=
ranges_in_bucket_snapshot_.cend()) and, for CleanType::CleanBucketData use
kickout_data_op_.table_name_ for logging and diagnostics (or log both where
available) instead of dereferencing kickout_range_tbl_it_->first so hash-table
retries no longer dereference cend.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2c22b6fd-c987-4eb1-aa7d-08c8e7b60dbc

📥 Commits

Reviewing files that changed from the base of the PR and between 1a4729b and 94a5786.

📒 Files selected for processing (5)
  • tx_service/include/cc/cc_request.h
  • tx_service/include/cc/template_cc_map.h
  • tx_service/include/read_write_entry.h
  • tx_service/src/cc/local_cc_handler.cpp
  • tx_service/src/tx_operation.cpp

Comment on lines +6986 to +6993
if (req.GetCleanType() == CleanType::CleanRangeDataForMigration)
{
// For data migration, we need to delete the range size info
// for the range that has been migrated to other node group.
int32_t partition_id = req.GetPartitionId();
assert(partition_id != INT32_MAX);
RemoveRangeSize(static_cast<uint32_t>(partition_id));
}
Copy link

@coderabbitai coderabbitai bot Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect KickoutCcEntryCc partition-id type/default/sentinels.
rg -n -C4 '\bclass\s+KickoutCcEntryCc\b|\brange_id_\b|GetPartitionId\s*\(|SetPartitionId\s*\(' tx_service/include/cc/cc_request.h

# Inspect where migration-clean requests set/read the partition id.
rg -n -C3 'CleanRangeDataForMigration|SetPartitionId\s*\(|GetPartitionId\s*\(' tx_service -g '!**/build/**'

# Confirm all RemoveRangeSize callsites and casts.
rg -n -C3 'RemoveRangeSize\s*\(' tx_service/include/cc/template_cc_map.h tx_service -g '!**/build/**'

Repository: eloqdata/tx_service

Length of output: 14477


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find where range_id is set for CleanRangeDataForMigration requests
rg -n -B5 -A5 'clean_type.*CleanRangeDataForMigration|CleanRangeDataForMigration.*range_id' tx_service/src -g '!**/build/**'

# Check the Reset method that sets range_id for KickoutCcEntryCc
rg -n -B3 -A10 'void Reset.*range_id|range_id_\s*=\s*range_id' tx_service/include/cc/cc_request.h | head -60

# Check RemoveRangeSize implementation to confirm it uses uint32_t as key
rg -n -A15 'void RemoveRangeSize\(uint32_t' tx_service/include/cc/template_cc_map.h

# Check if there are other casts from int32_t to uint32_t in similar cleanup paths
rg -n 'static_cast<uint32_t>' tx_service/include/cc/template_cc_map.h | grep -i 'partition\|range'

Repository: eloqdata/tx_service

Length of output: 5027


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find where CleanRangeDataForMigration requests are created (from tx_operation.cpp hint)
rg -n -B10 -A10 'CleanRangeDataForMigration.*=' tx_service/src/tx_operation.cpp | head -80

# Look for where range_id is populated for migration requests
rg -n -B5 'range_id.*=' tx_service/src/tx_operation.cpp | grep -A5 -B5 'kickout\|migration'

# Check function signature for CreateKickoutCcEntry / similar to see range_id source
rg -n 'CreateKickoutCcEntry|PrepareCcRequest' tx_service/src -g '!**/build/**' | head -20

Repository: eloqdata/tx_service

Length of output: 45


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Look at tx_operation.cpp around the migration cleanup code
sed -n '7880,7950p' tx_service/src/tx_operation.cpp

# Search for where range_id is passed to Reset() in migration context
rg -n 'Reset.*range_id|range_id.*Reset' tx_service/src/tx_operation.cpp

# Look for migration-related functions that might create requests
rg -n 'MigrateRange|migration.*range|range.*migration' tx_service/src/tx_operation.cpp | head -20

Repository: eloqdata/tx_service

Length of output: 3167


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find declaration of ranges_in_bucket_snapshot_
rg -n 'ranges_in_bucket_snapshot_' tx_service/src/tx_operation.cpp | head -10

# Look at the type and how it's populated
rg -n -B3 -A3 'ranges_in_bucket_snapshot_.*=' tx_service/src/tx_operation.cpp | head -40

# Check GetTableRangeKeys to understand what range IDs are returned
rg -n 'GetTableRangeKeys' tx_service/include -g '!**/build/**' | head -5

Repository: eloqdata/tx_service

Length of output: 705


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find the declaration of ranges_in_bucket_snapshot_ in class definition
rg -n 'ranges_in_bucket_snapshot_' tx_service/src/tx_operation.h tx_service/include -g '!**/build/**'

# Look at the try_emplace context to see what's being inserted
sed -n '7605,7620p' tx_service/src/tx_operation.cpp

# Look at what populates the container values
sed -n '7620,7640p' tx_service/src/tx_operation.cpp

Repository: eloqdata/tx_service

Length of output: 224


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Get declaration context around line 1504
sed -n '1495,1515p' tx_service/include/tx_operation.h

# Look at where ranges_in_bucket_snapshot_ is populated with range IDs
sed -n '7605,7635p' tx_service/src/tx_operation.cpp

Repository: eloqdata/tx_service

Length of output: 2167


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check if there are existing checks for negative partition/range IDs in similar code paths
rg -n 'partition.*>= 0|range_id.*>= 0|if.*>=.*0.*partition|if.*>=.*0.*range' tx_service/include/cc -g '!**/build/**'

# Check all places where partition_id is cast to uint32_t to see if any validate
rg -n -B2 'static_cast<uint32_t>\(.*partition' tx_service/include/cc/template_cc_map.h

# Look at UpdateRangeSize calls to see if they have bounds checking
rg -n -B5 'UpdateRangeSize\(static_cast<uint32_t>' tx_service/include/cc/template_cc_map.h | head -40

Repository: eloqdata/tx_service

Length of output: 2146


Add runtime bounds check before casting partition_id to uint32_t.

At lines 6990–6992, the code directly casts GetPartitionId() (int32_t) to uint32_t without checking for negative values. The assertion at line 6991 only guards against INT32_MAX and is disabled in release builds. If a negative value reaches this path (through upstream corruption or a bug), it will wrap to a large positive value when cast, erasing the wrong entry from range_sizes_ and leaving stale range-size metadata.

This pattern already exists defensively in the codebase (e.g., local_cc_shards.h:1192, cc_request.h:7836). Add a runtime check consistent with established practice:

Suggested fix
             if (req.GetCleanType() == CleanType::CleanRangeDataForMigration)
             {
                 // For data migration, we need to delete the range size info
                 // for the range that has been migrated to other node group.
                 int32_t partition_id = req.GetPartitionId();
-                assert(partition_id != INT32_MAX);
-                RemoveRangeSize(static_cast<uint32_t>(partition_id));
+                if (partition_id >= 0 && partition_id != INT32_MAX)
+                {
+                    RemoveRangeSize(static_cast<uint32_t>(partition_id));
+                }
+                else
+                {
+                    LOG(ERROR) << "Invalid partition_id in migration cleanup: "
+                               << partition_id;
+                }
             }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (req.GetCleanType() == CleanType::CleanRangeDataForMigration)
{
// For data migration, we need to delete the range size info
// for the range that has been migrated to other node group.
int32_t partition_id = req.GetPartitionId();
assert(partition_id != INT32_MAX);
RemoveRangeSize(static_cast<uint32_t>(partition_id));
}
if (req.GetCleanType() == CleanType::CleanRangeDataForMigration)
{
// For data migration, we need to delete the range size info
// for the range that has been migrated to other node group.
int32_t partition_id = req.GetPartitionId();
if (partition_id >= 0 && partition_id != INT32_MAX)
{
RemoveRangeSize(static_cast<uint32_t>(partition_id));
}
else
{
LOG(ERROR) << "Invalid partition_id in migration cleanup: "
<< partition_id;
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/include/cc/template_cc_map.h` around lines 6986 - 6993, The code
casts req.GetPartitionId() (int32_t) to uint32_t and calls RemoveRangeSize when
CleanType::CleanRangeDataForMigration, but only asserts partition_id !=
INT32_MAX which is disabled in release builds; add a runtime bounds check to
ensure partition_id is non-negative (and not INT32_MAX if needed) before the
static_cast and RemoveRangeSize call to prevent negative values wrapping to huge
uint32_t values. Locate the block handling CleanType::CleanRangeDataForMigration
around req.GetPartitionId(), validate partition_id >= 0 (and still guard against
INT32_MAX if required), and only call
RemoveRangeSize(static_cast<uint32_t>(partition_id)) when the check passes; if
the check fails, log or handle the error consistently with existing patterns in
local_cc_shards.h and cc_request.h.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caller will guarantee that this value is non-negative.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yi-xmu Thanks for the clarification. If the caller guarantees partition_id is non-negative, I’m good to drop the runtime check.

Optional tiny follow-up for maintainability: tighten the existing debug assertion to reflect the invariant and leave a brief comment. For example:

// Caller guarantees non-negative partition_id for migration cleanups.
assert(partition_id >= 0 && partition_id != INT32_MAX);
RemoveRangeSize(static_cast<uint32_t>(partition_id));

If you prefer, I can prep a small patch to do just this (assert + comment) and nothing else.


✏️ Learnings added
Learnt from: yi-xmu
Repo: eloqdata/tx_service PR: 462
File: tx_service/include/cc/template_cc_map.h:6986-6993
Timestamp: 2026-03-18T04:01:54.203Z
Learning: In eloqdata/tx_service, during CleanRangeDataForMigration handling in TemplateCcMap::Execute(KickoutCcEntryCc &req) (tx_service/include/cc/template_cc_map.h), the caller guarantees KickoutCcEntryCc::GetPartitionId() returns a non-negative partition_id; negative IDs will not reach RemoveRangeSize().

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: yi-xmu
Repo: eloqdata/tx_service PR: 438
File: tx_service/include/cc/template_cc_map.h:11969-11981
Timestamp: 2026-03-12T05:24:16.445Z
Learning: In eloqdata/tx_service, TemplateCcMap::UpdateRangeSize(uint32_t,int32_t,bool) in tx_service/include/cc/template_cc_map.h must return true only once when the loaded, non-dirty range size first crosses StoreRange::range_max_size; it must latch the internal need_split flag as true until ResetRangeStatus(partition_id) clears it, and must not overwrite the latched flag back to false on later calls.

Learnt from: liunyl
Repo: eloqdata/tx_service PR: 149
File: include/cc/cc_request.h:1876-1927
Timestamp: 2025-10-20T04:30:07.884Z
Learning: ScanNextBatchCc in include/cc/cc_request.h is used only for hash-partition scans; range-partition scans are handled by ScanSliceCc.

Learnt from: yi-xmu
Repo: eloqdata/tx_service PR: 454
File: tx_service/include/cc/object_cc_map.h:1574-1575
Timestamp: 2026-03-16T02:32:29.159Z
Learning: In `tx_service/include/cc/object_cc_map.h`, the `flags_str` field unpacked from entry tuples in `Execute(UploadBatchCc &req)` is intentionally unused. `ObjectCcMap` is non-range-partitioned (`TemplateCcMap<KeyT, ValueT, false, false>`), so `range_size_flags` carried in `flags_str` have no meaning in this context. The pause/resume logic does not need to track a flags offset for this class.

Learnt from: lzxddz
Repo: eloqdata/tx_service PR: 199
File: include/cc/local_cc_shards.h:233-234
Timestamp: 2025-11-11T07:10:40.346Z
Learning: In the LocalCcShards class in include/cc/local_cc_shards.h, the EnqueueCcRequest methods use `shard_code & 0x3FF` followed by `% cc_shards_.size()` to distribute work across processor cores for load balancing. This is intentional and separate from partition ID calculation. The 0x3FF mask creates a consistent distribution range (0-1023) before modulo by actual core count.

Learnt from: lokax
Repo: eloqdata/tx_service PR: 254
File: tx_service/src/cc/local_cc_shards.cpp:2949-3188
Timestamp: 2025-12-02T10:43:27.431Z
Learning: In tx_service/src/cc/local_cc_shards.cpp, whenever TryPinNodeGroupData is used, only call Sharder::Instance().UnpinNodeGroupData(node_group) if the recorded term is >= 0 (i.e., pin succeeded). Example: LocalCcShards::PostProcessFlushTaskEntries guards the unpin with `if (term >= 0)`.

Learnt from: lokax
Repo: eloqdata/tx_service PR: 149
File: src/remote/cc_stream_receiver.cpp:1066-1075
Timestamp: 2025-10-21T06:46:53.700Z
Learning: In src/remote/cc_stream_receiver.cpp, for ScanNextRequest handling, BucketIds() on RemoteScanNextBatch should never be empty—this is an expected invariant of the scan protocol.

Learnt from: yi-xmu
Repo: eloqdata/tx_service PR: 454
File: tx_service/include/data_sync_task.h:185-189
Timestamp: 2026-03-16T02:36:20.407Z
Learning: In eloqdata/tx_service, within a `DataSyncTask` (tx_service/include/data_sync_task.h), the member `id_` represents the partition ID of the current partition for range-partitioned tables (and core_idx for hash-partitioned tables). It is the same identifier used as the key in `range_sizes_` by `UpdateRangeSize()` and `ResetRangeStatus()`. Do not confuse `id_` with a separate "range_id" — in the range partition context, `id_` IS the partition_id.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
tx_service/src/tx_operation.cpp (1)

7975-7981: ⚠️ Potential issue | 🔴 Critical

Guard hash-path retry logging from dereferencing ranges_in_bucket_snapshot_.cend().

At Line 7979, error logging always dereferences kickout_range_tbl_it_->first. On hash-table kickout retries, kickout_range_tbl_it_ can be ranges_in_bucket_snapshot_.cend(), causing undefined behavior before retry.

Suggested fix
         if (kickout_data_op_.hd_result_.IsError())
         {
+            std::string_view failed_table_name_sv = "<unknown>";
+            if (kickout_data_op_.table_name_ != nullptr)
+            {
+                failed_table_name_sv = kickout_data_op_.table_name_->StringView();
+            }
             LOG(ERROR) << "Data migration: fail to kickout range data"
                        << ", table name "
-                       << kickout_range_tbl_it_->first.StringView()
+                       << failed_table_name_sv
                        << ", tx_number:" << txm->TxNumber()
                        << ", keep retrying";
             RetrySubOperation(txm, &kickout_data_op_);
             return;
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/src/tx_operation.cpp` around lines 7975 - 7981, The error log
unconditionally dereferences kickout_range_tbl_it_->first which may be equal to
ranges_in_bucket_snapshot_.cend() during hash-table kickout retries; update the
block that checks kickout_data_op_.hd_result_.IsError() to first test whether
kickout_range_tbl_it_ != ranges_in_bucket_snapshot_.cend() and only append
kickout_range_tbl_it_->first.StringView() when valid, otherwise log a safe
placeholder (e.g. "<no-range-iterator>") or omit the table name; keep other
context like txm->TxNumber() and the retry message intact so no dereference
occurs when the iterator is cend().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tx_service/include/cc/template_cc_map.h`:
- Around line 6986-6993: The assertion in the CleanRangeDataForMigration branch
is too strict and disallows a valid partition_id == 0; update the check where
you read partition_id from req.GetPartitionId() (inside the
CleanType::CleanRangeDataForMigration block) to allow zero by asserting
partition_id >= 0 && partition_id != INT32_MAX, then continue to call
RemoveRangeSize(static_cast<uint32_t>(partition_id)) as before; ensure you only
change the assert condition (not the cast or RemoveRangeSize call) so negative
values and INT32_MAX remain guarded.

In `@tx_service/src/tx_operation.cpp`:
- Around line 735-746: The code captures a raw pointer new_bucket_range_id_ptr
to an element in write_entry.forward_addr_ after a try_emplace, but subsequent
try_emplace calls (same map) can rehash and invalidate that pointer; change the
logic to store the map key ((new_bucket_ng << 10) | residual) instead of a raw
pointer and, wherever new_bucket_range_id_ptr is later dereferenced (uses around
the later try_emplace sites and at lines noted), perform a fresh lookup in
write_entry.forward_addr_ (e.g., find(key) or operator[] if creating) to obtain
the element (or its .first) before reading/writing it; update all uses that
currently rely on new_bucket_range_id_ptr (and similar pointer usages) to do a
key-based re-lookup to ensure safety across rehashes.

---

Duplicate comments:
In `@tx_service/src/tx_operation.cpp`:
- Around line 7975-7981: The error log unconditionally dereferences
kickout_range_tbl_it_->first which may be equal to
ranges_in_bucket_snapshot_.cend() during hash-table kickout retries; update the
block that checks kickout_data_op_.hd_result_.IsError() to first test whether
kickout_range_tbl_it_ != ranges_in_bucket_snapshot_.cend() and only append
kickout_range_tbl_it_->first.StringView() when valid, otherwise log a safe
placeholder (e.g. "<no-range-iterator>") or omit the table name; keep other
context like txm->TxNumber() and the retry message intact so no dereference
occurs when the iterator is cend().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fd7c2774-e355-49bf-8f7c-f2e99e0b2bb3

📥 Commits

Reviewing files that changed from the base of the PR and between 94a5786 and ab86fd5.

📒 Files selected for processing (2)
  • tx_service/include/cc/template_cc_map.h
  • tx_service/src/tx_operation.cpp

Comment on lines +6986 to +6993
if (req.GetCleanType() == CleanType::CleanRangeDataForMigration)
{
// For data migration, we need to delete the range size info
// for the range that has been migrated to other node group.
int32_t partition_id = req.GetPartitionId();
assert(partition_id > 0 && partition_id != INT32_MAX);
RemoveRangeSize(static_cast<uint32_t>(partition_id));
}
Copy link

@coderabbitai coderabbitai bot Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Allow partition 0 in the migration cleanup assertion.

At Line 6991, assert(partition_id > 0 && partition_id != INT32_MAX); is stricter than the established invariant and can trip debug builds for a valid partition_id == 0.

Suggested fix
-                assert(partition_id > 0 && partition_id != INT32_MAX);
+                assert(partition_id >= 0 && partition_id != INT32_MAX);
                 RemoveRangeSize(static_cast<uint32_t>(partition_id));

Based on learnings: during CleanRangeDataForMigration, the caller guarantees KickoutCcEntryCc::GetPartitionId() is non-negative.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/include/cc/template_cc_map.h` around lines 6986 - 6993, The
assertion in the CleanRangeDataForMigration branch is too strict and disallows a
valid partition_id == 0; update the check where you read partition_id from
req.GetPartitionId() (inside the CleanType::CleanRangeDataForMigration block) to
allow zero by asserting partition_id >= 0 && partition_id != INT32_MAX, then
continue to call RemoveRangeSize(static_cast<uint32_t>(partition_id)) as before;
ensure you only change the assert condition (not the cast or RemoveRangeSize
call) so negative values and INT32_MAX remain guarded.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yi-xmu Got it—leaving the assert as-is. If the invariant ever changes and a debug hit shows partition_id can be 0 during migration cleanup, we can relax it to >= 0 at that time.


🧠 Learnings used
Learnt from: yi-xmu
Repo: eloqdata/tx_service PR: 462
File: tx_service/include/cc/template_cc_map.h:6986-6993
Timestamp: 2026-03-18T04:01:54.203Z
Learning: In eloqdata/tx_service, during CleanRangeDataForMigration handling in TemplateCcMap::Execute(KickoutCcEntryCc &req) (tx_service/include/cc/template_cc_map.h), the caller guarantees KickoutCcEntryCc::GetPartitionId() returns a non-negative partition_id; negative IDs will not reach RemoveRangeSize().

Learnt from: yi-xmu
Repo: eloqdata/tx_service PR: 438
File: tx_service/include/cc/template_cc_map.h:11969-11981
Timestamp: 2026-03-12T05:24:16.445Z
Learning: In eloqdata/tx_service, TemplateCcMap::UpdateRangeSize(uint32_t,int32_t,bool) in tx_service/include/cc/template_cc_map.h must return true only once when the loaded, non-dirty range size first crosses StoreRange::range_max_size; it must latch the internal need_split flag as true until ResetRangeStatus(partition_id) clears it, and must not overwrite the latched flag back to false on later calls.

Learnt from: liunyl
Repo: eloqdata/tx_service PR: 149
File: include/cc/cc_request.h:1876-1927
Timestamp: 2025-10-20T04:30:07.884Z
Learning: ScanNextBatchCc in include/cc/cc_request.h is used only for hash-partition scans; range-partition scans are handled by ScanSliceCc.

Learnt from: yi-xmu
Repo: eloqdata/tx_service PR: 454
File: tx_service/include/cc/object_cc_map.h:1574-1575
Timestamp: 2026-03-16T02:32:29.159Z
Learning: In `tx_service/include/cc/object_cc_map.h`, the `flags_str` field unpacked from entry tuples in `Execute(UploadBatchCc &req)` is intentionally unused. `ObjectCcMap` is non-range-partitioned (`TemplateCcMap<KeyT, ValueT, false, false>`), so `range_size_flags` carried in `flags_str` have no meaning in this context. The pause/resume logic does not need to track a flags offset for this class.

Learnt from: lzxddz
Repo: eloqdata/tx_service PR: 199
File: include/cc/local_cc_shards.h:233-234
Timestamp: 2025-11-11T07:10:40.346Z
Learning: In the LocalCcShards class in include/cc/local_cc_shards.h, the EnqueueCcRequest methods use `shard_code & 0x3FF` followed by `% cc_shards_.size()` to distribute work across processor cores for load balancing. This is intentional and separate from partition ID calculation. The 0x3FF mask creates a consistent distribution range (0-1023) before modulo by actual core count.

Learnt from: lokax
Repo: eloqdata/tx_service PR: 149
File: src/remote/cc_stream_receiver.cpp:1066-1075
Timestamp: 2025-10-21T06:46:53.700Z
Learning: In src/remote/cc_stream_receiver.cpp, for ScanNextRequest handling, BucketIds() on RemoteScanNextBatch should never be empty—this is an expected invariant of the scan protocol.

Learnt from: lokax
Repo: eloqdata/tx_service PR: 254
File: tx_service/src/cc/local_cc_shards.cpp:2949-3188
Timestamp: 2025-12-02T10:43:27.431Z
Learning: In tx_service/src/cc/local_cc_shards.cpp, whenever TryPinNodeGroupData is used, only call Sharder::Instance().UnpinNodeGroupData(node_group) if the recorded term is >= 0 (i.e., pin succeeded). Example: LocalCcShards::PostProcessFlushTaskEntries guards the unpin with `if (term >= 0)`.

update comment
@yi-xmu yi-xmu force-pushed the update_datamigration_for_rangepartition branch from ab86fd5 to 48e5817 Compare March 18, 2026 04:45
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tx_service/src/tx_operation.cpp (1)

7894-7899: Extract the range-table reconstruction into one helper.

The TableName::Type(...) + kickout_range_table_ rebuild now lives in four branches. Pulling that into a helper would keep the retry/drop paths aligned and make future table-type changes one edit instead of four.

Also applies to: 7927-7934, 8022-8029, 8077-8084

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/src/tx_operation.cpp` around lines 7894 - 7899, Several branches
duplicate the TableName::Type(...) + kickout_range_table_ reconstruction (used
where kickout_range_table_ is set from kickout_range_tbl_it_->first); extract
that logic into a single helper (e.g., BuildTableNameFromSlice or
ResolveTableName) that accepts the key/slice or iterator key (the value passed
to TableName::Type and StringView()) and returns a TableName constructed with
the resolved Type and Engine; replace the four duplicated blocks (the current
uses of TableName::Type(...) and kickout_range_table_ assignment plus the
similar code at the other noted locations) with calls to this helper so
retry/drop paths share the same construction logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tx_service/src/tx_operation.cpp`:
- Around line 7894-7899: Several branches duplicate the TableName::Type(...) +
kickout_range_table_ reconstruction (used where kickout_range_table_ is set from
kickout_range_tbl_it_->first); extract that logic into a single helper (e.g.,
BuildTableNameFromSlice or ResolveTableName) that accepts the key/slice or
iterator key (the value passed to TableName::Type and StringView()) and returns
a TableName constructed with the resolved Type and Engine; replace the four
duplicated blocks (the current uses of TableName::Type(...) and
kickout_range_table_ assignment plus the similar code at the other noted
locations) with calls to this helper so retry/drop paths share the same
construction logic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 31bcdb08-830c-47ab-9013-f94e9474e773

📥 Commits

Reviewing files that changed from the base of the PR and between ab86fd5 and 48e5817.

📒 Files selected for processing (2)
  • tx_service/include/cc/template_cc_map.h
  • tx_service/src/tx_operation.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
  • tx_service/include/cc/template_cc_map.h

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Update data migration with new key sharding logic for range partition.

1 participant